42 ◾ Bioinformatics
The FASTX-toolkit tools listed in Table 1.4 are used for quality assessment and quality
adjustment. The major limitation is that “fastq_quality_filter” of FASTX-toolkit does not
process the paired-end FASTQ files together and that usually results in singletons or reads
without pairs in any of the two paired-end FASTQ files. Most aligners do not accept to pro-
cess paired-end FASTQ files with singletons. The FASTX-toolkit solution to the singleton
problem is to mask the low-quality bases instead of removing the reads with low-quality
bases. Thus, “fastq_masker” program is used instead of “fastq_quality_filter” to mask the
bases of Phred quality score less than a user-defined threshold “-q”.
fastq_masker \
-q 20 \
-i bad.fastq \
-o bad_masked.fastq \
-Q33
fastqc bad_masked.fastq
firefox bad_masked_fastqc.html
The above “fastq_masker” command masks the bases with quality lower than 20 Phred
quality score “-q 20”; therefore, they will be ignored by aligners and assemblers.
For paired-end FASTQ files produced by an Illumina instrument, there is another
FASTQ processing program, developed by Illumina for paired-end FASTQ files, called
Trimmomatic [15]. It is a multithreaded command-line Java-based program and is more
modern than FASTX-toolkit. It was developed by Illumina to perform several operations,
including detection and removing the known adaptor fragments (adapter.clip), trim-
ming low-quality regions from the beginning of the reads (trim.leading), trimming low-
quality regions from the end of the reads (trim.trailing), filtering out short reads (min.
read.length), in addition to other operations with different quality-filtering strategies for
dropping low-quality bases in the reads (max.info and sliding.window). Trimmomatic can
be used in two modes: simple and palindrome modes. In the simple mode, for removing
adaptor sequences, the pairwise local alignment between adaptor sequence and reads is
used to scan reads from 5′ ends to 3′ ends using seed and extend approach. If a score of
a match exceeds a user-defined threshold, both the matched region and the region after
alignment will be removed. The entire read is removed if an alignment covers all the read.
The simple Trimmomatic approach may not be able to detect the short adaptor sequence.
Therefore, the palindrome model is used because it is able to detect and remove short
fragment sequences of adaptors. Palindrome is used only for the paired-end data. Both
forward and reverse reads will have equal number of valid bases and each read comple-
ments another. The valid reads are followed by the contaminating bases from the adaptors.
The tool uses the two complementary reads to identify the adaptor fragment or any other
contaminating technical sequence by globally aligning the forward and reverse reads. An
alignment score that is greater than a user-defined threshold indicates that the first parts
of each read reversely complement one another and the remaining read fragments which
match the adaptor sequence will be removed.